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Abstract 



An algorithm for first-principles electronic structure calculations having 

a computational cost which scales linearly with the system size is presented. 

Our method exploits the real-space localization of the density matrix, and 

in this respect it is related to the technique of Li, Nunes and Vanderbilt. 

The density matrix is expressed in terms of localized support functions, and a 

matrix of variational parameters, Lq/3 having a finite spatial range. The total 

energy is minimized with respect to both the support functions and the Lap 

parameters. The method is variational, and becomes exact as the ranges of 

the support functions and L matrix are increased. We have tested the method 

on crystalline silicon systems containing up to 216 atoms, and we discuss some 

of these results. 
71.10.+X, 71.45.Nt 
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I. INTRODUCTION 



There has recently been rapidly growing activity in condensed matter simulation based 
on a quantum description of the electrons. The methods being used range from simple tight- 
binding mo delsS to full ab initio techniques^. Conventional electronic structure methods face 
severe difficulties for large systems, because the number of computer operations generally 
increases as the third power of the number of electrons. The development of methods for 
which the number of operations increases only linearly with the number of electrons (linear 
scaling methods) is an important target of current research. We describe here a promising 
first-principles linear-scaling method, which we have tested on silicon systems, and we 
present some results of these tests. 

The method we propose is closely related to several recently described techniques, par- 
ticularly that of Li et a/.i. The key idea of their method is that the electronic ground state 
should be determined by variation of the total energy with respect to the density matrix, 
linear-scaling being obtained by imposing a spatial cut-off on the density matrix. This 
approach is already becoming widely used in the tight-binding frameworkB. The method we 
describe is a generalization of the approach of Li et al. to first principles calculations. As we 
shall point out, the density-matrix approach is related to the technique of Mauri et a/.ii, 
which also has been highly successful in dynamical tight-binding simulations. The parallel 
implementation of linear-scaling tight-binding methods has also been recently reported!. 
Other linear-scaling methods less relevant to the present work have also been described in 
refs.Hi. 

The basic principles of our method are outlined in sec. and its practical implementation 
is described in sec. pT| . The results of our tests are presented in sec. Conclusions and 
suggestions for future developments are given in sec. 0. 
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II. BASIC PRINCIPLES 

The first-principles method we describe is based on density functional theory (DFT)I. 
Within DFT, the total energy Etot can be regarded as a functional of the occupied Kohn- 
Sham orbitals ipi{r), and the ground state can be obtained by minimizing the total energy 
with respect to these orbitals. Equivalently, the total energy can be treated as a functional 
of the density matrix p(r, r'), defined as: 

p(r,£') = E^^(i:)^r(l'), (1) 

i 

where the sum goes over all occupied orbitals. The ground state can then be obtained by 
minimizing Etot with respect to p{r,r^), subject to the conditions that p is idempotent, i.e.: 

p{l, i') = Jdr!' p{r, r") p(r", r'), (2) 

and that the number of electrons Nei has the correct value, the latter being given by: 

Nei = 2 fdrp{r,r). (3) 



Whether one works in terms of ipiiz) or in terms of p{r,r!), the essence of the calculation is 
to determine the occupied subspace. 

If one works with the density matrix, the idempotency condition is awkward to enforce 
directly, and it is more convenient to minimize subject to the condition that all its eigenvalues 
lie between and 1. This corresponds exactly to the commonly used device of working with 
variable occupation numbers in DFT0. This can be achieved following the strategy proposed 
by Li et a/.i, in which p is expressed as: 

p = 3a*a — 2a*a*a, (4) 

where cr{r,if) is an auxiliary 2-point function. Here the asterisk represents the continuum 
analog of matrix multiplication, so that e.g. the 2-point function a * cr(r, r') is given by: 

a*a{r,r!) = J dr['a{r,f).a{r^',r[). (5) 



The point here is that if A is an eigenvalue of a, then the corresponding eigenvalue of p is 
/(A) = 3 A^ — 2 A'^. This transformation guarantees that if a is nearly idempotent, p will be 
idempotent to an even better approximation. The process of minimizing Etot has the effect 
of driving the eigenvalues towards zero or unity, so that p is driven towards idempotency, as 
described in more detail by Li et a/.i. 

For practical first-principles calculations, o"(r, r') must be made separable, i.e. expressed 
in the form: 

cT{r,r[)=Y^ct>a{r)L^p<t>p{r[), (6) 

a,l3 

where the (/'a(r) will be referred to as support functions. It follows that p{r,r') is also 
separable: 

pir,r!) = J2UL)K^pML), (7) 

with the matrix K given by: 

K = 3 LSL - 2 LSLSL, (8) 

where Sap is the overlap matrix: 

Sap = J dr(j)a{r) (f)p{r). (9) 

In order to turn this into a linear-scaling method, we now require firstly that the support 
functions 0a (r) be non-zero only within localized spatial regions, referred to as support 
regions, and secondly that the matrix elements Lap be non-zero only if the corresponding 
regions are separated by less than a chosen cutoff distance Rcut- It is natural to impose 
these conditions, because in general p{r,r^) decays to zero as the separation |r — r' | goes to 
infinity. This implies that the calculation will become exact as the cutoff distance and the 
size of the support regions are increased. 

The strategy is now to minimize the total energy both with respect to the support 
functions and with respect to the Lap coefficients, subject only to the condition that the 
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number of electrons is held fixed at the required value. Since we are imposing constraints 
on the size of the support regions and the range of the L matrix, the calculation will be 
variational: the minimum energy is an upper bound to the true ground-state energy. 



III. PRACTICAL IMPLEMENTATION 



We have implemented the above general scheme in the local density approximation (LDA) 
using the pseudopotential technique!. The algorithm developed in the present work performs 
all calculations on a grid in real space. In this respect, our techniques have much in common 
with the real-space grid methods recently developed by Chelikowsky et al. for DFT- 
pseudopotential calculations. We work with periodic boundary conditions in order to avoid 
surface effects, but the technique could easily be applied with other boundary conditions. 
At present, our calculations are restricted to cubic repeating cells, and each cell is covered 
by a uniform cubic grid of spacing 5x. 

The support regions are chosen to be spherical with radius Rreg, and are centered on the 
atoms. Each region is associated with a certain number v of support functions, where v is 
the same for all regions. It is important to note that the total number of support functions 
must be at least half the number of electrons, but can be greater, and we exploit this 
freedom in the calculations described later. Each support function is represented by 

its values (pailS) on the grid points in its region. 

We now need to evaluate the various terms in the total energy, namely the kinetic 
energy Ek, the electron-pseudopotential Epg, the Hartree energy Eh and the exchange and 
correlation energy E^c- In an exact calculation E^ would be given by: 



We approximate this by replacing the Laplacian by a finite-difference approximation and 
the integration by a sum over grid points. In the terminology of Chelikowsky et a/.lii, we are 
currently using the second-order approximation, in which the calculation of 0a (e) at any 




(10) 



5 



grid point involves two points on either side in each cartesian direction. It is a simple matter 
to go to higher approximations, and the computational cost of doing so is not significant. 
It is important to note that this scheme gives non-zero (pa (l) values at grid points on 
which 0a (r) itself is zero, and it is essential to keep these values when calculating the matrix 
elements involved in Ek- 

The energies Epg, Eh and E^c all depend on the electron density n(r), whose value at 
grid point rg^ is: 



The pseudopotential energy is evaluated by multiplying n{r) by the total pseudopotential 
at each grid point and summing over the grid. (For present purposes, we are working with 
local pseudopotentials, although the extension to non-local pseudopotentials is straightfor- 
ward.) The LDA exchange-correlation energy is evaluated similarly by summing the values 
^(24) ea;c[?^(£f)], where exdn) is the exchange-correlation energy per electron at density n. 
The Hartree energy is evaluated in reciprocal space using the Fourier components of n(r^) 
obtained by discrete fast Fourier transform. 

The ground state is determined by minimization of the total energy with respect to 
both the support functions 0a (r) and the Lap coefficients, with the electron number held 
constant. We perform the minimization by the conjugate gradients method, and for this 
purpose we need analytical expressions for the derivatives dEtot/d(f)aiLi) and dEtot/dLap- 
These expressions are straightforward to derive, as will be described in more detail in a 
separate publication. The explicit formulas for these derivatives are: 



a/3 



(11) 



4 Y.[Kap{H(l)p){re) + ^{LHL)apMLe) 



(12) 



2 {LSLHL + LHLSL)a/3 Mu)] 



and 



dEtot 



6 {SLH + HLS)ap - 4 {SLSLH + SLHLS + HLSLS)a, 



(13) 



dLaf3 
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Here, {H (pp) (r^) denotes the function obtained by acting with the Kohn-Sham Hamiltonian 
on (p/Biz), evaluated at grid point r^. In the matrix products Hc,f3 is the matrix element of 
the Kohn-Sham Hamiltonian between support functions (paiz) and 0/3 (r). We stress that 
these are exact formulas for the derivatives of the discrete grid expressions for Efot- It is 
also worth noting that the formula for dEtot/ dLap is identical to what would be obtained 
in a tight-binding formulation with non-orthogonal basis functions. 

The linear-scaling behaviour arises from the spatial localization of the support functions, 
which implies that the overlap and Hamiltonian matrices Sq/j and H^ii vanish if the distance 
between the support functions exceeds a certain cutoff. With the cutoff we are imposing 
on Lapi this means that all matrices appearing in the expressions for Etot and its derivatives 
are sparse, and the number of non-zero elements grows linearly with the number of atoms. 

In practice, the minimization is currently performed by making a sequence of conjugate 
gradients steps for the L^js coefficients, followed by a sequence of steps for the support 
functions, repeating the alternation between these two types of variation. Ultimately, more 
efficient procedures may prove possible. 

In the tight-binding technique of Li et a/.i, the chemical potential rather than the number 
of electrons, Nf.i, was held constant. This is inconvenient, and we have preferred to hold Nf.i 
fixed during the minimization. To achieve this, we project the derivatives dEtot/ dL^p so 
that the resulting search direction is tangential to the local surface of constant iVg;, and after 
each displacement of Lap we make a correction to regain the correct N^i value. In performing 
this constrained minimization, there is considerable freedom in the choice of object function, 
and we find that it is convenient to minimize Etot — f^N^i, where /i is set equal to an estimate 
for the chemical potential. 

IV. PRACTICAL TESTS 

The total ground-state energy calculated by the above scheme converges to the correct 
value as the radius Rreg of the support regions and the value of the spatial cutoff radius 
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Rcut for the Lap coefficients are increased. Clearly, the practical usefulness of the method 
depends on the manner of this convergence. We must be able to obtain acceptable accuracy 
with manageable values for Rreg and Rcut- The size of the region and the value of the cutoff 
needed also determine the size of the system at which linear-scaling behaviour is obtained. 
To test these questions, we have performed calculations on repeating cells of perfect crystal 
silicon. 

The electron-core interactions are represented by the simple model pseudopotential due 
to Appelbaum and HamannEl. This is a local pseudopotential which is known to give a 
satisfactory representation of the energetics and electronic structure of crystalline silicon. 
The exchange-correlation energy is given by the Ceperley-Alder formula0. We have per- 
formed tests on systems of different sizes, using a grid spacing of 0.34 A. This is very similar 
to the spacing typically used in pseudopotential plane-wave calculations on silicon, and is 
sufficient to give reasonable accuracyEl. In all cases, we have found that the conjugate gra- 
dient method converges in a stable and fairly rapid way to the ground state. Generally, 50 
steps each of (paiz) variation and Lap variation are more than enough to achieve convergence 
of Etot to within 10~^ eV/atom. 

To examine the dependence of Etot on the region radius Rreg, we have done calculations 
on a system of 216 atoms. For this purpose, we have not imposed any cutoff on the Lap 
coefficients, but instead have determined them by exact diagonalization of the Hamiltonian 
matrix after every fifth displacement of the support functions. This is equivalent to using 
an infinite cutoff for the Lap coefficients. We stress that exact diagonalization is only used 
for the purpose of this test. It is clearly not a linear scaling operation, and in practical 
implementations is replaced by variation with respect to the Lap coefficients. Our results 
for Efot for five region sizes shown in Fig. [V| demonstrate that the convergence of Etot is very 
rapid, and that an accuracy of 0.05 eV/atom is reached for a region radius of 2.55 A. The 
conventional plane-wave method needs a plane-wave cutoff of ~ 12 Ry to achieve the same 
accuracy with commonly used pseudopotentials for silicon. This implies that linear-scaling 
behaviour for those parts of the calculation involving the calculation of Sap and Hap matrix 
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elements is reached for systems of roughly 100 atoms. 

The dependence of Etot on the cutoff radius Rcut for the La/3 coefficients has been studied 
for a system of 216 ions. Since we have shown that a region radius of 2.21 A provides good 
accuracy, we have used this length to perform this test. Here we have used conjugate 
gradient minimization with respect to both 0Q,(r) and La/3. The convergence of Etot with 
increasing R^ut in the L matrix is illustrated in Fig. 0, where it can be seen that this 
convergence is fairly rapid. The error in the total energy is less than 1% with Rcut = 6 A. 
It is worth noticing that we do not need longer cut-offs in the L matrix than are needed 
in the orthogonal tight-binding casei to obtain a similar degree of convergence in the total 
energy with a given support region radius. 



V. DISCUSSION 

We have shown that our proposed first-principles linear-scaling method is promising. 
The tests on c-Si show that the total energy converges rapidly as the size of the support 
regions and the cutoff radius are increased. For the time consuming parts of the calculation 
involving the computation of overlap and Hamiltonian matrix elements, the linear-scaling 
regime appears to be reached with only 100 atoms. Linear scaling is reached for the parts 
involving products of these sparse matrices requires larger systems, but these operations are 
essentially the same as would be needed in a tight-binding calculation. 

It should be noted that there are still problems that need to be addressed before a fully 
operational simulation code is written. One such problem concerns the calculation of the 
Hellmann-Feynman forces, and the possible need for Pulay correctional. We do not believe 
that this problem is particularly severe, and we hope to address it in a later publication. We 
also note that a considerable effort will be needed on code optimization before conclusions 
can be drawn about the speed of the method in practical problems. 

Finally we return to the relation between our method and previous linear-scaling 
schemes. Our reliance on the density matrix techniques proposed by Li et a/.i has already 
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been stressed. The close relation between these techniques and the approach of Mauri et al. 
for tight-binding calculations has been pointed out by Nunes and Vanderbiltlll. However, 
an important difference is that in our scheme there is considerable flexibility in choosing 
the number of support functions u, whereas in the scheme of Mauri et al. it appears to 
be necessary to take this number equal to half the number of electrons. We believe that 
the flexibility in our scheme may be an advantage, because we expect that increasing the 
number of support functions will allow one to reduce the size of the support region, and 
therefore the length of the cutoff for Lajj coefficients. 

The method we propose appears to be well suited to parallel computation, and we are 
currently investigating possible parallel implementations. 
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FIGURES 



Variation of the total energy with the region radius Rreg- 

Variation of the total energy with the Lap matrix range, Rcut- The quantity plotted is 
the error in the total energy per atom with respect to the value obtained with infinite Rcut- 



13 



